Memory Memory Memory Compute Processor Compute Processor Compute Processor Interconnection
نویسنده
چکیده
Many scienti c applications that run on today s multiprocessors such as weather forecast ing and seismic analysis are bottlenecked by their le I O needs Even if the multiprocessor is con gured with su cient I O hardware the le system software often fails to provide the available bandwidth to the application Although libraries and enhanced le system interfaces can make a signi cant improvement we believe that fundamental changes are needed in the le server software We propose a new technique disk directed I O to allow the disk servers to determine the ow of data for maximum performance Our simulations show that tremendous performance gains are possible Indeed disk directed I O provided consistent high performance that was largely independent of data distribution obtained up to of peak disk bandwidth and was as much as times faster than traditional parallel le systems
منابع مشابه
A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملMemory Bound vs. Compute Bound: A Quantitative Study of Cache and Memory Bandwidth in High Performance Applications
High performance applications depend on high utilizations of bandwidth and computing resources. They are most often limited by either memory or compute speed. Memory bound applications push the limits of the system bandwidth, while compute bound applications push the compute capabilities of the processor. Hierarchical caches are standard components of modern processors, designed to increase mem...
متن کاملBeyond Processor-centric Operating Systems
By the end of the decade, computing designs will shift from a processor-centric architecture to a memorycentric architecture. At rack scale, we can expect a large pool of non-volatile memory (NVM) that will be accessed by heterogeneous and decentralized compute resources [3, 17]. Such memory-centric architectures will present challenges that today’s processor-centric OSes may not be able to add...
متن کاملCost-Performance Evaluation of SMP Clusters
Clusters of Personal Computers have been proposed as potential replacements for expensive compute servers. One limitation in the overall performance is the interconnection network. A possible solution is to use multiple processors on each node of the PC cluster. Parallel programs can then use the fast shared memory to exchange data within a node, and access the interconnection network to commun...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994